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(54) System and method for network monitoring 

(57) An Edgorithmlc snoop unit (28) snoops Inter- 
leaved transactions over a shared bus (42) as data is 
transmitted via transactions between clients (34, 36, 36) 
coupled to the shared bus, and executes various algo- 
rithms upon data snooped from the transactions. The 
unit Includes one or more algorithmic entries (46. 48, 
50} along with an algorithmic engine (44). Each algorith- 
mic entry Includes a client ID register that Identifies the 
client associated with a transaction, a starting address 
register and an ending address that define the address 
range upon which an algorithm will be executed, a read 
or write flag tiiat identifies whether the transactions is a 
read or write operation, an encryption key register for 



holding an encryption key, a decryption key register for 
holding a decryption key, an algorithm ID register tor 
Mentifying an algorithm to be executed, a status/control 
register whk;h holds various status and control, an 
accumulator for accumulating results from the execution 
of the algorithm, a temporary storage area, and one or 
more memory pointers that index a location in memory 
for results comprising a large amount of datEL If a match 
is found, the algorithm klentifled by the algorithm ID reg- 
ister Is executed upon the data carried by the transac- 
tion. 
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Description 

[0001] The present invention relates to algorithms 
that perform functions upon data streams as the data 
streanns are carried by bus transactions, such as check- 
sum functions that verify Integrity and cryptographic 
functions that provide security. More specifically, the 
present Invention snoops Interleaved transactions over 
a shared bus architecture as data Is transmitted via 
transactions between clients coupled to the shared t)us, 
and executes various algorithms upon data snooped 
from the transactions. 

[0002] In the art of computer networking, protocol 
stacks are commonly used to transmit data between 
network nodes that are coupled by network media. Not- 
work nodes Include devices such as computer worksta- 
tions, servers, network printers, network scanners, and 
the like. To harmonize the development and Implemen- 
tation of protocol stacks, the International Standards 
Organization (ISO) promulgated an Open System Inter- 
connection (OSi) Reference Model that prescrflaes 
seven layers of network protocols. 
[0003] Rgure 1 Is a block diagram 1 0 of the OSI ref- 
erence model. The model Includes a hardware teyer 12, 
a data link layer 14. a network layer 1 6, a transport layer 
18, a session layer 20, a presentation layer 22, and an 
appRcatSon layer 24. Each layer is responsible for per- 
forming a particular task. Hardware layer 1 2 Is responsi- 
ble for handling both the mechanical and electrical 
details of the physical transmission of a bit stream. Data 
link layer 14 Is responsible tor handling the packets. 
Including any error detection and recovery that occurred 
In the physteal layer. Network layer 16 Is responsible tor 
providing connections and routing packets In the conn- 
munlcation network, Including handling the address of 
outgoing packets, decoding the address of incoming 
packets, and maintaining routing Information for proper 
response to changing loads. Tiansport layer 18 Is 
responsible for low-level access to the network and the 
transfer of messages between the users. Including par- 
titioning messages into packets, maintaining packet 
order, flow control, and physical address generation. 
Session layer 20 is responsible for implementing the 
process-to-process protocols. Presentation layer 22 Is 
responsit)le for resolving the differences In formats 
among the various sites In the network, including char- 
acter conversions, and hatf duplex^ll duplex (echoing). 
Finally, application layer 24 is responsible fbr Interacting 
directly with the users. Layer 24 may include appRca- 
tlons such as electronic mail, distributed data t>ases, 
web browsers, and the like. 

[0004] Before the ISO promulgated the OSI r^er- 
ence model, the Defense Advanced Research Projects 
Agency {DARFVV) promulgated the ARPNET reference 
model. The ARPNET reference model Includes four lay- 
ers, a network hEuxJware layer, a network interface layer, 
a host-to-host layer, and a process/application layer. 
[0005] As their names imply, the OSI reference 



model and the ARPNET reference model provide guide- 
lines that designers of protocols may or may not chose 
to follow. However, most networking protocols define 
layers that at least loosely correspond to a referenc 
5 model. 

[0006] In the field of computing, there are many 
popular protocols used to transmit data between net- 
work nodes. F=br example, TCP/IP. AppleTalk®, Net- 
BEUI, and IPX ar« all popular protocols that are used to 
10 transmit data t>etween servers, workstations, printers, 
and other devices that are coupled to computer net- 
works. 

[0007] Whether a network node has a single "net- 
work client" or many "network clients", It is common for 

IS a network node to use several transmlsston protocols. 
As used herein, the term "network dienf refers to a 
device (such as a network adapter, port, or modem) that 
Is used to transmit data b^een two network nodes 
over network media. For example, a typical computer 

20 workstation may use TCP/IP to communicate over the 
Internet via a modem, and IPX to communicate with a 
network server via a network adapter. Likewise, a 
printer may be configured to receive print Jobs using 
either both the Apple'mik^ protocol and the NetBEUI 

£5 protocol over the same network adapter. Typk^lly, a 
software routine e)d8t1ng at data link layer 14 or network 
layer 16 routes data packets between the network 
adapter and the proper protocol stack. 
[0008] Various protocols also define methods to 

30 verify the integrity of data transmitted by the protocol. 
I=br example, consider a TCP/IP packet as it anives at a 
network client such as an Ethernet network adaptor. 
TTie entire Ethemet packet Is protected by a cydte 
redundancy check (CRC) code that Is cataulated and 

X stuffed Into the Ethemet packet by the sending network 
adapter, and Is used by the receiving network adapter to 
verify the Integrity of the Ethemet packet if the integrity 
of the packet cannot be verified, the packet is discarded. 
[0009] Encapsulated within the Ethemet packet is 

40 the IP portton of the TCP/IP protocol. The IP portion has 
a 16 bit checksum code that protects the IP header. If 
the Integrity of the IP header cannot be verified, the 
packet is discarded. The TCP portton of the TCP/IP pro- 
tocol is encapsulated unthln the IP portkin. and has a 1 6 
4s bit checksum code that protects the TCP header and 
the contents of the TCP portion of the packet If the 
Integrity of the TCP header or the contents of the TCP 
portion cannot be verified, the packet Is discarded and 
the sender will retransmit the packet after not receiving 
so an acknowledge packet from the intended recipient 
[0010] In the example discussed above, the Integ- 
rity of the Ethern^ packet is verified by the networking 
hardware at hardware layer 12. Accordingly, this func- 
tion Is performed quite quickly. However, the higher lay- 
off eis of the protocol stack are typically implemented by 
software. Cabulating a checksum using a software rou- 
tine is considerably slower. In the prior art, a checksum 
required by a higher layer of the protocol stack coukl not 
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be generated at the hardware layer because the hard- 
ware layer did not have knowledge of the higher layers 
of the stack. 

[OCni] One prior art solutton that speeds up the 
generadon of a checksum at a higher layer of a protocol 
stack is to use a hardware checksumming facinty that is 
controlled by the higher layer of the protocol slack. For 
example, when a TCP module seeks to verify the Integ- 
rity of a TCP header and "rts respective data, the TCP 
module writes to a register of the hardware checksum- 
ming facilrty to begin the checksumming process, and 
polls the facinty to determine when the checksum is 
complete. While such a solution is faster than a check- 
sum generated solely by a software routine, there Is still 
a significant delay while the checksum is generated. 
[0012] Another method of calculating checksums 
using hardware was disclosed by Snyder et al. In U.S. 
Pat No. 5.522,039, which is entitled "Cateulatlon of Net- 
work Data Check Sums by Dedteated Hardware With 
Software Connections." This patent diseases generating 
a "gross checksum* for the entire packet as the packet 
Is transferred via a direct memory access (DIWA) opera- 
tion between adapter memory and system memory. 
Higher layers of the protocol st^ then cateulate the 
checksum required by calculating a "difference check- 
sum" for the portions of the packet that are not needed, 
and then subtracting this difference checlcsum from the 
gross checksum to form a "net checksum", which Is the 
checksum required by that layer of the protocol stack. 
Since the difference checksum calculated by softWEU'e is 
calculated over a relatively small number of bytes, the 
scheme disclosed by Snyder et al. Incurs a smaller time 
penalty than other prior art methods. 
[0013] In U.S. Application Serial No. 08/937,912, 
which is entitled "Hardware Checksum Assist For Proto- 
col Stacks" and was filed on September 25, 1997, Brian 
M. Dowiing et al. disctose a scheme wheretsy check- 
sums and other algorithms may t>e calculated upon data 
as it Is received by a network client, such as an Ethemet 
adapter. This application Is incorporated by reference. 
As disclosed by Dowiing et al., a fly-by checksum gen- 
eration unit Is embedded In a network client. As the net- 
work dient transfers a packet into memory, the f!y-t>y 
checksum unit calculates a checksum. Dowiing et al. 
also disclose tiiat the fly-l:^ checlcsum unit includes 
beginning and ending byte registers, thereby reducing 
or eliminating the need to use software to calculate a 
difference checksum, as disclosed lay Snyder et al. 
[0014] Snyder et al. and Dowiing et al. each reduce 
the overhead required to generate a checksum by 
reducing the number of times that the data must be 
•touched". However, the mechanisms disclosed by Sny- 
der et al. and DoAvling et al. must be provided for each 
network client. In contrast, software routines and hard- 
ware checksumming facilities do not need to be pro- 
vided for each network client, but are sk}wer for the 
reasons discussed above. 

[0015] The number of types of network clients pro- 



vided fn typical computer systems Is proliferating. Net- 
work ctlents cunnently available include network 
adapters (Including Ethemet adapters, token ring adapt- 
ers, and the like), parallel pods, serial ports, modems 

5 (including standard V90 phone modems, DSL modems, 
cable modems, ISDN modems, and the like), USB 
ports, IEEE 1394 ports, IR and RF ports, SCSI ports, 
and EIDE ports. In addition, new network client stand- 
ards continue to be devetoped, such as the HomePNA 

/o standard that allows network nodes to be connected via 
standard phone lines. It Is contemplated that encryption 
and authentication will be important components of the 
HomePNA standard because phone lines are not 
secure. Protocols, such as TCP/IP, may be used to 

/5 transmit data via many of these network client stand- 
ards. 

[0016] In a networic node that includes several net- 
work clients, it Is often not practical to Include the mech- 
anisms disclosed by Snyder et al. or Dowiing et al. 

20 because such a mechcmlsm must be included with each 
network client What Is needed In the art Is a method 
and apparatus that provides performance advantages 
disclosed by Snyder et al. and Dowting et al., yet does 
not need to be repHcated for each network client 

25 [0O17] The present invention Is an algorithmb 
snoop unit that snoops Interleaved transactions over a 
shared bus as data Is transmitted via transactions 
between clients coupled to the shEu^ed bus, and exe- 
cutes various algorithms upon data snooped from the 

30 transactions. An algorithmb snoop unit in accordance 
with the present inventton is coupled to the shared bus 
along with a variety of bus dients, such as a system 
CPU, storage unit(s), and one or more network clients. 
The aigoritiimlc snoop unit Itself Is also a bus client. 

35 [0018] Within the algorithmic snoop unit of the 
preserrt invention Is one or more algorithmic entries 
along with an algorithmic engine. Each edgorlthmic entry 
Includes a series of algorithmic entry control Information 
registers and a result / temporary storage unit The 

40 algorithmic entry control Informatran registers Include a 
dient ID registerthat identifies the dient assodated vrith 
a transactk)n, a starting address register and an ending 
address that define the address range upon whteh an 
algorithm will be executed, a read or write flag that kJen- 

45 titles whether the transaction is a read or write opera- 
tion, an encryption key registerfor holding an encryption 
key, a decryption l<ey register for hoWing a decryption 
key, an algorithm ID register for identifying an aigoritiim 
to be executed, and a status / control register which 

so holds various status and control bits. The result /tempo- 
rary storage unit includes an accumulator for accumu- 
lating results from the execution of the algorithm, a 
temporary storage area, and one or more memory 
pointers that index a location In memory where resuits 

55 comprising a large amount of data (such as the results 
of a cryptographic algorithm) may be stored. 
[0019] The algorithmic engine includes a bus sepa- 
ration unit, an algorithmic ntry match unit, and an aigo- 
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rithmic calculation unit The bus 6eparation unit 
separates a t»js transactions Into address, data, cPient 
ID, and read / write status components. The atgorlthmk: 
match unit compares these components with the con- 
tents of corresponding registers of each active algorith- 
mic entry. If a match is found, the algorithm identified by 
the algorithm ID register Is executed upon the data car- 
ried by the transaction. More than one algorithm may be 
executed for each transaction. 
[0020] One of the primary benefits provided t>y the 
present invention is that the execution of as many algo- 
rithm as desired may be 'piggy backed* upon a memory 
tremsfer that must occur anyway. Since an algorithmic 
snoop unit in accordance with the present Invention 
does not depend on the system CPU to execute algo- 
rithms, and existing transactions are snooped to provide 
data to the algorithmic snoop unit, very few additional 
system resources are used and execution of the algo- 
rithms is complete as soon as the memory transfer is 
complete. 

Rgure 1 Is a block diagram of an Open System 

Interconnection (OSi) reference model. 

Rgure 2 is a simplified block diagram of a system In 

accordance with the present Invention. 

Rgure 3 is a block diagram of an algorithmic snoop 

unit shown In Rgure 2, In accordance with the 

present invention. 

Rgure 4 Is a btock diagram of algorithmic entry con- 
trol information registers of an algorithmic entry of 
the algorithmic snoop unit of Rgure 3. 
Rgure 5 is a block diagram of a result / temporary 
storage unit of the algorithmic entry of the algorith- 
mic snoop unit 28 of Rgure 3. 
Rgure 6 is a block diagram of an algorithmic 
engine, which Is part of the algorlthmte snoop unit 
of Rgure 3. 

[0021] The present invention executes algorithms 
that perform functions, such as checksum functions that 
verify integrity and encryption functions that provide 
security, upon data by snooping Interleaved tremsac^ 
tions over a shared bus as the data Is transmitted via the 
transactions. As used herein, the term "interleaved 
transactions" refers to the fact that the bus transactfons 
carrying a particular stream of data need not be contig- 
uous. In other words, different bus clients may own the 
bus during the period In whfch the stream Is being trans- 
mitted. For example, if two data streams are being 
transmitted over a bus during die same time interval, the 
bus transactions may possibly be Interleaved such that 
first, third, fourth, and eighth bus transactions carry data 
that is part of the first stream, and second, fifth, sixtii, 
and seventh bus transactions carry data that is part of 
the second stream. 

[0022] The present invention could be especially 
useful in devic s such as a Hewlett-Packard JetDirect 
print server, wher in multiple network clients are inte- 



grated into a single device. IHowever, the present inven- 
tion may be advantageously used In any system 
wherein a plurality of bus clients communteate via a 
shared bus, and transactions carried by ttie bus can be 

5 snooped. 

p)023] Rgure 2 Is a simplified block diagram of a 
system 26 In accordance with the present Inventton. 
System 26 includes atgorithmk: snoop unit 28. system 
CPU 30. system memory 32, network clients 34, 36, 

10 and 38, storage unit 40, and shared bus 42. Network di- 
ents 34, 36, and 38 may be any type of networic dient 
known in the art For example, network dient 34 mey be 
a HomePNA network adapter, network dient 36 may be 
a IEEE 1394 adapter, and network dient 38 may be an 

IS Ethemet adapter. Of oourse, these examples are 
merely representative, and the network dients may be 
any type of device that transmits data between comput- 
ing devices over network media. For exanple, if system 
26 Is a Hewlett-Packard JetDirect print server, It would 

20 be desiralsle to have one of the networic dients be a 
IEEE 1284 parallel port or a serial port. 
[0024] Storage unit 40 represents any type of stor- 
age unit known in the art, such as a hard disk drive, a 
CD-ROM driven or a floppy disc driva Storage unit 40. 

ss algorithmic snoop unit 28, system CPU 30, system 
memory 32, and networic dients 34, 36, and 38 are 
generk:ally referred to as bus dients. Bach bus dient is 
capable of transferring data between Itself and another 
bus client via shared bus 42. 

30 [0025] Shared bus 42 represents any intercx)nnec- 
tion fabric through which data flows between tiie bus di- 
ents (algorithmic snoop unit 28, system CPU 30, 
system memory 32, and networic clients 34, 36, and 38, 
and storage unit 40). The main requirement Is that algo- 

35 rithmk: snoop unit 28 be capable of sncx)plng transac- 
tions between the bus dients. Note that In one partteular 
configuration described in greater detail beksw, it Is 
desirable to have algoritiimic snoop unit 28 snoop Its 
own memory transacttons. Network clients 34, 36, and 

40 36, algorithmic snoop unit 28, and storage unit 40 vnll 
typically transmit data using direct-memory access 
(DMA) operations, as is known in the art. 
[0026] System 26 may be implemented on a single 
integrated circuit mlcrocx)ntrDller. For example, system 

45 CPU 30 may be implemented as an ARM CPU macro- 
cell provided by ARM Ltd., with shared bus 42 adhering 
to the ARM Advanced System Bus (ASB) spedficatkin 
of tiie Advanced Mk^rocontroller Bus Architecture 
(AMBA) specified by ARM Ltd. 

so [0027] The ASB Is non-multiplexed and includes 
separate address and data txjses, as well as a series of 
bus cxintrd lines. The read / write status of the transac- 
tion and the identity of the dient are easily derived from 
the bus control lines. In this example, wherein shared 

55 bus 42 Is an ASB-AMBA bus, algorithmic snoop unit 28 
monftors bus 42 to determine when a bus client Is writ- 
ing to another txjs client, such as particular memory 
range In system memory 32. Based on the bus client 
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writing the data and the bus client to which data Is being 
written, unit 28 may perform an algorithmic function on 
the data as the data Is being transfemed over bus 42. 
[0028] Now consider that system 26 represents a 
computer system based on a PCI bus architecture, and 5 
network clients 34, 36. and 38 are (mplemented as PCI 
cards. In such a configuration the PCI cards may or may 
not be on the same PCI bridge, so it may not be possible 
to snoop transactions between the network clients and 
system memory 32 from another PCI slot, it the network io 
clients are all on the SEime bridge, then algorithmic 
snoop unit 28 may exist on a PCt card that is also in the 
same bridge. On the other hand, if the network clients 
are on different (or individual) bridges, then unit 28 must 
be provided In a location within the txjs fabric that allows 15 
unit 28 to snoop the transactions. Those skilled in the 
art will recognize the location within the tHJS fat>ric 
wherein unit 28 will t>e at>le to snoop transactions 
between the t>us clients. Also note that If system 26 con- 
tains multiple CPUs and/or multiple cache levels, unit 28 20 
may need to be provided with cache coherency mecha- 
nisms to snoop CPU transactions occunlng within 
cache memories or among different CPUs. Such cache 
coherency mechanisms are known in the art 
[0029] Rgure 3 is a block diagram of algorithmic 25 
snoop unit 28. Unit 28 includes algorithmk: engine 44, 
which is coupled to shared bus 42, and algorithmic 
entries 46, 46, and 50. Each algorfthmic entry includes 
a result / temporary storage unit and algorfthmic entry 
control information registers. I=br example, cdgorithmic ao 
entry 48 Includes result / temporary storage unit 52 and 
algorithmic entry control Information registers 54. Each 
algorithmic entry is coupled to algoritlvnic engine 44. 
Various registers and entries of algorithmic snoop unit 
28 are accessible by system CPU 30 of Rgure 2 by 35 
shared bus 42, as will be described In greater detail 
below. 

[0030] ngure 3 Includes N aigorithmte entries. The 
number of algorithmic entries provided is based on the 
number of bus clients, the numt>er of simultaneous 40 
transfers supported by each bus cRent, and the numt>er 
of algorithms desired to be executed. Basically, a 
designer implementing the present invention will want to 
provide enough algorithmic entries to allow all desired 
algorithms to be executed simultaneously. For example, 45 
consider an encrypted data packet being received from 
network client 34 in Rgure 1. Further assume that the 
encrypted data packet Is part of a stream of data that is 
arriving at network client 34 and is being transmitted out 
at network client 36. In this example, it may be desirable so 
to catculate a checksum for the incoming data from net- 
work client 34, decrypt the data from network client 34, 
and calculate a checksum in preparation fortransmitting 
the data to network client 36. Accordingly, in this exam- 
pie, three algorithmic entries will be required. Perform- ss 
Ing an algorithm operation upon incoming data in 
anticipation of sending the data to another bus client will 
be discussed In greater detail below. 



[0031] Rgur 4 Is a block diagram of aigorithmte 
entry control Infonmation registers 64 of aigorithmte 
entry 48 of aigorithmte snoop unit 28 of Rgure 3. Of 
course. Rgure 4 represents the algorithmic entry con- 
trol information registers of each of the aigorithmte 
entries. Each of the registers is aocesstele k)y system 
CPU 30 of Rgure 2 over shared btis 42. Registers 54 
Include client ID register 56, starting address register 
58, ending address register 60. read or write flag 62, 
encryption key register 64, decryption key register 66, 
Edgorithm ID register 68, and status /control register 70. 

[0032] Client ID register 56 defines the client that Is 
reading data to or writing data from shared bus 42 In 
Rgure 2. In accordance with the ASB-AMBA bus sped- 
fication, the client ID Is easily derived from bus control 
lines of shared bus 42. Other bus specifications may 
Indicate the client ID in different ways, such as a multi- 
plexed transactten. Those skilled In the art will recog- 
nize how to extract the client ID from a bus transaction 
for any given bus Eur^hitecture. Note that aigorithmte 
snoop unit 28 nnay Itself t>e a client that is identified by 
the contents of cRent ID register 56. This will be 
described In greater detail below. 
[0033] Starting address register 58 specifies the 
starting address upon whteh the algorithm will t>e exe- 
cuted. IHowever, It may be desirable to have starting 
address register 58 specify three different values. Rrst, 
It may be desirable to have starting address register 58 
store the first address of a buffer in memory whteh has 
been configured to receive an incoming packet This 
would be desirable, for example, if unit 28 configures 
algorithmic entries by snooping a configuration dialog 
between a CPU and a networtc client, as described in 
greater detail below. Second, as stated above, starting 
address register 58 must spechy the starting address 
upon which the algorithm will be executed. Note that the 
starting address upon whteh the algorithm will t>e exe- 
cuted may be specified as an absolute address, or alter- 
natively, as an offset from the first address of the fc)uffer. 
Rnally, starting address register 58 may Include a word 
offset to indicate the position within the first word that 
should actually be used as the starting point for the cal- 
culation. I=br example, consider an Ethernet packet that 
includes a TCP/IP packet If one desires to calculate a 
TCP/IP checksum on the data contained within the 
Ethemet packet, an offset from the first address of the 
buffer will be required to reach the data portion of the 
TCP/IP packet If system 26 in Rgure 2 addresses 
memory in 32-bit or 64-bit increments, the starting point 
may not fall on an even address boundary and it may 
also be necessary to specify a word offset 
[0034] Typteaily the starting address (as well as the 
ending address below) will be an address in system 
memory 32 in Rgure 2. However, this is not required by 
the present invention. The address may also be an 
address mapped to any of the other bus clients. For 
example, the present invention may be used to execute 
an algorithm upon data as the data is DMA'd directly 
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from a network client to a hard disk, provided the data Is 
t>eing stored in an address of the hard disk controller 
t>ounded by the starting and ending addresses stored in 
r gisters 58 and 60, respectively. 

[0035] Ending address register 60 specifies the last 
address upon which the algorithm will be executed. 
Similar to starting register 58, it may be desirable to 
have ending address register 60 specify three different 
values. Rrst, it may be desirable to have ending address 
register 60 store the last address of a buffer In memory 
which has been configured to receive an incoming 
packet Note that the last address of the tHjffer may be 
stored as an absolute address or as an offset from the 
first address of the buffer stored in starting address reg- 
ister 58. Second, ending address register 60 must spec- 
ify the iast address upon whbh the algorithm will be 
executed. Note that the last address upon which the 
algorithm will be executed may be specified as an abso- 
lute address, or attemativeiy, as a negative offset from 
the last address of the buffer. Rnalty, ending address 
register 60 may include a word offset to lndk:ate the 
{position within the last word that shouki actually t>e used 
as the ending point for the calculation. 
[0036] Read or write flag 62 Indk^tes whether the 
algorithm should t>e executed if the bus client identified 
by client ID register 56 is writing data or reading data. 
Altemath/ely, a 2-bit register could be provided to allow 
the algorithm to be calculated If reading data, writing 
data, or either reading or writing otetta. This Infomnatlon 
is typk»lly carried by a bus control line of shared bus 42. 
[0037] Encryption key register 64 and decryption 
key register S6 hold encryption emd decryption keys to 
be used If the algorithm is a cryptograph^ algorithm. 
IWo keys may be required If the algorithm being exe- 
cuted is a combined decryption / encryption algorithm. 
Such an algorithm may be desirable If Incoming data is 
bemg decrypted, and the decrypted data is being 
encrypted In preparation for transmission to another t>us 
client A combined decryption / encryption Edgorithm 
may also be performed by assigning tfie decryption por- 
tion of the algorithm to one algorithmic entry, assigning 
the encryption portion of the algorithm to a second algo- 
rithmte entry, and setting the client ID register of the sec- 
ond algorithmic entry to the algorithmic snoop unit 26 
Itself. As the data Is decrypted in accordance with the 
parameters stored in the first algorithmic entry and unit 
28 stores the decrypted data in a first memory range, 
unit 28 will snoop its own bus transactions and encrypt 
the data In accordance with the parameters stored in 
the second algorithmic entry. This will be descrik>ed in 
greater detail below. 

[0038] Algorithm register 68 holds an algorithm 
identifier that indrcates which algorithm will be executed. 
Finally, status / control register 70 includes various flags 
that are required to control execution of the algorithm, 
and perform other housekeeping taslcs. Flags within 
status / control register 70 include, but are not limited to, 
an active / inactive flag to Indicate whether algorithmic 



engine 44 of Rgure 3 should execute an algorithm in 
accordance with the contents of the registers of algo- 
rithmic entry 54, a finished flag that indk^es whether 
the ending address has been reached, and an error flag 
5 that indicates whether an error condition has been 
encountered. 

[0039] Rgure 5 Is a block diagram of result / tempo- 
rary storage unit 52 of algorithmic entry 48 of algorith- 
mic snoop unit 28 of Rgure 3, Of course, Figure 6 

10 represents the result /temporary storage unit of each of 
the algorithmic entries. Unit 52 Includes an accumulator 
72, temporary storage 74, and memory pointers 76 and 
78. Accumulator 72 is used to accumulate checksums, 
cyclical redundancy checks, or other appropriate codes 

15 wherein the result of executing an algorithm upon a 
range of data is a single value or small numt>er of val- 
ues. 

[0040] Temporeuy storage 74 Is used as intermedi- 
ate storage when calculating data. For example, if the 
20 algorithm being executed is a decryption algorithm that 
has a 128-bit decryption key, several data values from 
several transaction may need to be temporarily stoned 
t>efore the decrypfton key can be applied to a segment 
of data. 

25 [0041] Accumulator 72 is used to store algorithmic 
results of limited size, such as 8 or 16 bytes, in contrast, 
memory pointers 76 and 78 point to regtons In memory 
where longer results may be stored. For example. If an 
incoming Ethemet packet is being decrypted, then the 

30 decrypted results may be stored in the memory range 
indexed by memory pointer 76. As each word of data is 
stoned, pointer 76 Is incremented to point to the next 
address in memory. Furthermore, If the data stored in 
the packet is also being encrypted in anticipation of 

35 sending the data to another network dient, then the 
algorithm can t>e a combined decryption / encryption 
algorithm that first decrypts and then encrypts the data, 
and the encrypted data may be stored In the memory 
range indexed by memory pointer 78. One implemenl- 

40 ing the present Invention may choose to add addrtionat 
memory pointers If required by the desired algorithms. 
[0042] As mentioned at>ove, if It is desired to t>oth 
decrypt and encrypt Incoming data, a single combined 
decryption / encryption algorithm can be used that first 

45 decrypts data, and then encrypts the data that was just 
decrypted. This method employs a single algorithmic 
entry and results in Edgorithmlc snoop unit 28 producing 
two streams of data, the decrypted stream and the 
encrypted stream. Alternatively, two algoritfimlc entries 

so may be used. The first errtry Is configured to snoop data 
from the network client, decrypt the data using the 
decryption key stored in register 66 of Rgure 4, and 
store the decrypted data in the memory range indexed 
by memory pointer 76 of the first entry. The second 

55 entry is configured to snoop the decrypted data being 
written by algorithmic snoop unit 28 Itself. The data 
snooped by the second entry is the decrypted data 
being written to the memory locations Indexed by mem- 
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ory pointer 76 of the first entry. Th algorithm ID stored 
In register 68 of the second entry Invokes an encryption 
algorithm, and the encrypted data Is stored In the mem- 
ory locations indexed by memory pointer 76 of the sec- 
ond entry. This method may be preferable because it 
may not be practical to combine different decryption and 
encryption algorithms into a combined decryption / 
encryption algorithm. 

[0043] Note that some cryptographic algorfthms 
may operate on large blocks of data, in which case 
using the memory pointers is appropriate. However, 
other cryptographic algorithms may only operate on a 
few bytes at a time, in which case the results can be 
stored in accumulator 72 and be retrieved periodically 
be system CPU 30 In Figure 2. 

[0044] Rgure 6 Is a k>lock diagwi of algorithmic 
engine 44 of Rgure 3, which Is part of algorithmic snoop 
unit 28 of Figure 3. Algorithmic engine 44 Includes bus 
element separation unit 80, algorithmic entry match unit 
62. and algorithmic calculation unit 64. 
[0045] Algorithmic snoop unit 28 snoops every bus 
transaction. Accordingly, every bus transaction is pro- 
vided to bus element separation unit 80 via shared bus 
42. Bus element separation unit 80 separates each bus 
transaction into an address that is provided on address 
bus 86, a dlent ID that is provided on cnent ID bus 88, a 
read / write signal that is provided on R/W line 90, and 
data which Is provided on data bus 88. Note that if all 
this Infomiation Is provided by Individual lines of shared 
bus 42, the implementation of unit 80 is trivial. However, 
if shared bus 42 uses multiplexed or pipelined transac- 
tions, then unit 80 may need to be Implemented using 
demultiplexers, latches, and similar logic. Those sidlled 
in the art will recognize how to Impiemem bus element 
separation unit 60 for any given bus architecture. 
[0046] Address bus 86, client ID bus 88, and R/W 
line 90 are provided to algorithmic entry match unit 82. 
For each bus transaction, algorithmic entry match unit 
82 determines whether an algorithmic entry needs to be 
processed. Note that more than one algorithmic entry 
may be involved for any given bus transaction. Also note 
the bus transactions carrying a particular stream of data 
need not be contiguous. In other words, the transactions 
may be interleaved. i=6r example, a particular algorith- 
mic entry m ight only be invoked once every 20 or 30 bus 
transactions, with tiie other transactions carrying otiier 
data between other sets of bus clients. 
[O047] As mentioned above, unit 82 receives the 
address, client ID, and read or write status of the trans- 
action from bus element separation unit 80. Unit 82 also 
receives the contents of client id register 56, starting 
address register 58, ending address register 60, read or 
write flag 62, and status / control register 70 from each 
algorithmic entry In Rgure 3 via bus 92. For each algo- 
rithmic entry that is active based on status /control reg- 
ister 70, unit 82 determines whether the client ID from 
the transaction matches the client ID stored in client ID 
register 56 of the algorithmic entry, whether the address 



of the transaction fails within the range specified by 
starting address register 58 and ending address regis- 
ter 60 of the algorithmic entry, and whettier the read / 
write status of the transaction matches the read / write 
^ flag stored in read or write flag 62 of the algorithmic 
entry. If all three of these parameters match for any (or 
multiple) algoritiimlc entries, unit 82 Indicates that the 
appropriate algorithmic entries should be processed \jy 
asserting a con'esponding line of active algorithmic 
io entries bus 94. 

[0046] /Mgorithmfc entry match unit 82 also pro- 
vides "partial word per entry" bus 96. For each entry, 
bus 96 Indicates whether the algorithm should be exe- 
cuted upon the whole menrwry word carried by data bus 
IS 88, or Just a portion of the word. The infonnation carried 
by bus 96 is derived from the word offset information 
stored In starting address register 58 and ending 
address register 60, as discussed above. Of course, 
registers 58 and 60 could also be provided directly to 
20 algoritiimlc calculation unit 84, In which case bus 96 
would not be needed because the same information 
could be generated within unit 84. 
[0049] For each active algorithmic entry specified 
by active algorithmic entries bus 94, algorithmic calcula- 
25 tion unit 84 executes the algorithm based on the trans- 
action data canied Isy data bus 98 and the operands 
stored In algorWimk: entry control infonnation registers 
54 of Figure 4 and result /temporary storage unit 52 of 
Rgure 5. Algorithmic calculation unit 84 receives tiie 
so contents of encryption key register 64, decryption key 
register 66, and algorithmic Id register 68 for each active 
algoritiimlc entry via bus 100. Status / control register 
70 is also canied between each algorithmic entry and 
unit 84 via bus 1 00 to provMe any status or control intor- 
35 mation, and to allow unit 84 to set any en-or or condition 
flags In the registers 70. Algoritiimte calculatton unit 84 
also has access to accumulator 72, temporary storage 
74, and memory pointers 76 & 78 for each active algo- 
ritiimb entry via bus 1 02. 
40 [0050] Based on the infonnation provided by the 
active algorithmic entries and the transaction data pro- 
vided Iv data bus 98, unit 84 executes an algorithm for 
each active algorithmic entry. The actual inrtplementa- 
tion of algorlthmk: calculatton unit 84 will, of course, vary 
45 based on the types of algorithms supported. For exam- 
ple, If the present invention is only used to calculate 
checksums, then unit 84 would simply be a series of 
adders,that add the transaction data to the contents of 
accumulator 72 and store the results back to accumula- 
50 tor 72. However, if unit 84 provWes cryptographic algo- 
rithms, it will obviously t>e more complex. Those skilled 
In the art will recognize how to implement unit 84 to sup- 
port the algorithms desired. 

[OOSI] Note that algorithms supported by unit 84 
55 may be Implemented in hardware, or unit 84 may con- 
tain programmable elements and the algorithms may be 
frnplemented by software code or microcode. Also, if tiie 
algorlttims are Implemented as software code or 



13 



EP 1 CMS 292 A2 



14 



microcode, the code may be stored In a BIOS routine f 
unit 84, or may be stored In unit 64 under control of sys- 
tem CPU 30 of Rgure 2. 

[0052] Rnally. as discussed above, some of the 
algorithms supported by unit 84 may need to store data 
In system memory 32 of Rguro 2 because the data 
stream produced by such algorithms Is too lai^e to store 
In an algorithmic entry. This was discussed above with 
reference to memory pointers 76 and 78 of Rgure 5. 
Such data Is earned by bus 104 to shared bus 42, and 
algorithmfc snoop unit 28 stores this dEda In memory 
Just like any other bus cHent As mentioned atxsve and 
discussed in greater detail beiow, algorithmic snoop unit 
28 may snoop its own data in accordartce with the con- 
tents of algorithmic entries. 

[0053] Having discussed the Implementation algo- 
rithmic snoop unit 28 above, it is helpful to understand 
how software being executed by system CPU 30 of Rg- 
ure 2 can best utilize unit 28. Before giving specific 
examples, first consider that unit 28 calculates algo> 
rithms on the fly as data is transferred between bus cli- 
ents. The software may or may not know what 
algorithms need to be calculated for each transfer 
before the transfer begins. For this reason, It is benefi- 
cial to have ail of the algorithms provided by unit 28 also 
be available as software routines. 
[0054] For example, consider that a stream of data 
being received by an Ethernet adaptor using the TCP/IP 
protocol Is being stored In merrwry. When the first 
paci^t arrives, CPU 30 does not know what protocol will 
be encapsulated within the packet (though it could 
guess), and therefore it Is possible that no algorithms 
will have been configured in algorithmic snoop unit 28 to 
calculate the checksum for this packet. Therefore, a 
sofhware module that implements the TCP/IP stack mey 
have to calculate the TCP/IP checksum for the first 
packet Thereafter, the module predkits that subsequent 
packets are part of the same data stream, and config- 
ures algorithmic entries to perform the checksum cateu- 
lation for each subsequent packet If the module's 
prediction is incorrect, the module simply uses the soft- 
ware-based version of the appropriate algorithm. How- 
ever, rf the predtetion is connect {as it almost always will 
be), the peribrmance benefits provkied by the present 
inventton will be realized. 

[0055] Also note that when receiving an Ethernet 
packet that includes a TCP/IP packet, it will usually be 
the case that the end of the Ethernet packet is not 
known until the whole packet arrives. In this situation, 
the checksum may include a few bytes beyond the end 
of the TCP/IP packet, such as the MAC CRC. The soft- 
ware module can handle this situation by treating the 
checksum as a "gross checksum" and subtracting a "dif- 
ference checksum" to reach tiie TCP/IP checksum, as is 
known in the art and discussed above. 
[0056] Executing an algorithm upon Incoming data 
is falriy straightfonward. the software module that pre- 
dicts that the algorithm will be needed simply configures 



an algorithm^ entry of snoop unit 28 to execute the 
algorithm. However, performing an algorithm upon out- 
going data Is more complex. Consider that a TCP/IP 
packet includes a TCP/IP header that Includes a check- 

5 sum. The TCP/IP heacter comes before the TCP/IP 
data. /Vccordlngly, as the packet Is being transmitted out 
by a networic dlent, the checksum must be known 
before the whole packet is transmitted. Accordingly, for 
a TCP/IP packet, algoritiimic snoop unit 28 will not be 

10 able to generate a checksum on the fly because the 
checksum will already have been transmitted before unit 
28 can finish calculating the checksum. 
[0057] There are several ways to address this prob- 
lem. One of the foremost benefits provided by the 

15 present invention is that the execution of the algorithm 
may be "piggy badced" upon a memory transfer that 
must occur anyway. Accordingly, If the outgoing packet 
Is stored in storage unit 40, algorithmk: snoop unit 28 
can be configured to snoop the memory transacttons of 

20 storage unit 40 as unit 40 writes the packet to system 
memory 32 In preparation for sending the outgoing data 
to anotiier t>us client. Simiiarty, assume that a software 
module detects that incoming data from networic client 
34 Is immediately being transmitted as outgoing data to 

25 networic dlent 36 In Rgure 2. This Is a common in a print 
server, where incoming data from a computer Is Imme- 
diateiy sent as outgoing data to a printer. In this situa- 
tion, the software module can oonfigure one algorithmk: 
entry to calculate the checksum required fbr the Incom- 

30 Ing data packet, and configure another algorithmic entry 
to caknilate the checksum required for the outgoing 
data. /Accordingly, txith checteums are calculated as 
the incoming data Is DMAd into memory by the networic 
dient tiiat is receiving tiie data. 

35 [0058] Even assuming that It is not possible to Iden- 
tify a memory transfer upon which execution of the aigo- 
rrthm can be "piggy backed", algorithmk: snoop unit 28 
can still be configured to snoop memory read transac- 
tions system CPU 30 for the meirwry range c»ntain- 

40 ing the outgoing packet CPU 30 simply reads all 
memory locations that contain tiie packet, and tiien 
retrieves the checksum from unit 26. While this method 
requires that tiie data be "touched", CPU cycles are still 
saved because tiie CPU need only read the data, and 

45 does not have to calculate the checksum because the 
checksum Is cateulated concun-entiy by unit 28. 
[0059] As a final exEunpie, consider that system 26 
of Figure 2 is a Hewlett-Packard JetDirect print sender, 
networic client 34 is an Ethernet adapter, and network 

so dient 36 is a HomePNA adapter. Assume that tiie 
Ethemet adapter is coupled to a computer via an Ether- 
net connection, the HomePNA adapter is coupled to a 
printer by a HomePNA connection, data sent over the 
HomePNA connectkan must be encrypted, and TCP/IP 
55 is used to send data over both adapters. Further 
assume tiiat the computer Is sending data to the printer, 
several packets have been transmitted, and the relevant 
software modules executed by system CPU 30 have 
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detected that the data stream described above is active. 
Rrst, system CPU 30 defines a region In mennory to 
receive the Incoming packets. Such memory regions 
are often refen^ed to In the art as 'buffers", and each 
buffer win typically be a couple of kilobytes. In this exam- 5 
pie, the buffer defined by CPU 30 will be referred to as 
the "first buffer. 

[0060] Next CPU 30 will conflgura a first algorithmk: 
entry to calculate the TCP/IP checksum for the Incom- 
ing packet Client ID register 56 will be set to refer to fo 
network client 34, starting address register 58 will be set 
to refer to the locatton within the buffer where the begin- 
ning of the TCP/IP data is expected to be stored, ending 
address register 60 will be set to the end of the buffer, 
read or writa flag 60 will be set to "write", and algorith- is 
mic ID register 68 will be set to "checksum". Rnally, sta- 
tus / control register 70 will be set to indicate that the 
entry Is active. 

[0061] Next CPU 30 will configure a second algo- 
rfthmte entry to encrypt the data In anticipation of send- 20 
Ing the data to the HomePNA adapter, /^galn, client ID 
register 56 wilt be set to refer to network client 34, start- 
ing address register 58 will be set to refer to the locatkin 
within the buffer where the beginning of the TCP/IP data 
Is expected to be stored, ending address register 60 will 25 
be set to the end of the buffer, read or write flag 60 will 
be set to "write". Furthermore, the encryption key will be 
stored In encryption key register 64, the algorithm ID 
register will be to indicate the proper encryption 
algorithm, memory pointer 76 of Rgure 6 will be set to 30 
point to a second buffer in system memory 32 to receh/e 
the encrypted data from algorithmte snoop unit 28, and 
status / control register 70 will be set to indk^ate that the 
entry Is active. 

[0082] Rnally. CPU 30 will configure a third algoritfi- 35 
mic entry to calculate tiie TCP/IP checksum of the 
encrypted data In anticipation of sending the encrypted 
data to tiie HomePNA adapter. Since tiie encrypted 
data is generated by algorlthmk: snoop unit 28, unit 28 
must be configured to snoop its own transactions to cal- 40 
culate the checksum. Accordingly, client ID register 56 
will be set to refer to algorithmic srwop unit 28, starting 
address register 58 will be set to r^er to the same loca- 
tion as memory pointer 76 of the second algorrthmk: 
entry (the beginning of the second buffer), ending 45 
address register 60 will be set to the end of the second 
buffer, read or write flag 60 will be set to "write" and 
algorithmic ID register 68 will be set to 'checksum'. 
Rnally, status / control register 70 will be set to indksate 
that the entry is active. so 
[0083] Now theft all three elgorithmic entries have 
been corrfigured, assume that the Ethernet adapter 
(network client 34 In this example) begins to receh/e a 
data packet As the data is received the Ethemet 
adapter DM A's the data into the first buffer, with algorith- ss 
mic snoop unit 28 snooping each transaction. When a 
memory transaction carrying the first word of TCP/IP 
data occurs, algorithmic entry match unit 82 of Figure 6 



will detect that the flrst and second algorithmic entries 
are active. When the second algorithmic entry Is proc- 
essed, algorithmic snoop unit 28 will initiate a txjs trans- 
action (either at this point or after a few more words are 
received from the Ethemet adapter) to store the 
encrypted data in the second buffer. Unit 28 will snoop 
Its own transaction, and the transaction generated by 
the second algorithmk: entry will cause the third aigo- 
rithmk: entry to begin active, thereby calculating the 
TCP/IP checksum for the outgoing encrypted data. 
[0064] This process will continue for each word of 
the packet until ail bytes of tiie packet are receh/ed. 
When the packet Is received, the appropriate software 
modules will inactivate the three algorithmic entries, 
thereby making them available for another cak:ulation. 
The appropriate software module will also access accu- 
mulator 72 of Rgure 5 of the first aigoriUimIc entry to 
retrieve the Incoming TCP/IP checksum. The retrieved 
checksum will then be compared to the checksum 
Included with the packet to verify the integrity of the 
packet. 

[0065] Next, the ou^olng TCP/1 P checksum will be 
retrieved from accumulator 72 of the third algorithmic 
entry. The outgoing checksum can be stuffed Into the 
proper k^cation of the second buffer, and the HomePNA 
adapter can be signaled to transmit the encrypted data 
stored In the second buffer. 

[0066] Consider the substantial advantages the 
present Invention provides over the prior art As soon as 
the Incoming packet Is reoeh/ed, the Incoming and out- 
going checksums are available, the data has been 
encrypted, and the outgoing packet Is ready to transmit 
All cak^ulations are done in parallel with the reception of 
the Incoming data, and few extra transactions are 
required to execute the algorithms. Even when algorith- 
mic snoop unit 28 generates memory transactions 
(such as when encrypting data), these transaction will 
tend to be interieaved with the transacttons generated 
by the network dtent (or other bus dlent) and will not 
slow down the reception rate of the network client. Note 
that In the above example, even If the Incoming data 
was encrypted, It could be easily decrypted by using a 
combined decryptton / encryption algorithm, as dis- 
cussed above, or by configuring a fourth algorlthmte 
entry. 

[0067] In contrast, prior art techniques require that 
the data be touched" several times. While some prior 
art techniques do provide for the cak:ulation of various 
algorithms "on the fly", none approach the problem as 
comprehensively and efficlentiy as the present inven- 
tion. The present inverrtion allows many algorithms to 
be caksulated concurrently by snooping tFEmsactlons 
that must occur anyway. 

[0068] Furthermore, the present Invention is 
extremely flexible, li^nsactions from any bus entity 
(even the algorithmb snoop unit of the present invention 
Itself) can fomi th basis of an algorithms calculation. 
Accordingly, the present invention can execute aigo- 
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rithms on memory-toHnemory transfers, network 
Bdapter-to-memory transfers, disk-to-nnemory transferB, 
and the like. 

[0069] In the examples above, the algorithmk: 
entries of algorlthmfc snoop unit 28 where configured by s 
CPU 30 in antic'^ation of a data transfer between bus 
clients. However, in another embodiment, even this min- 
imal task may be eliminated. Assume that for each bus 
cfient In Rgure 2, algorithmic snoop unit 28 includes a 
"bus client configuration dialog monitor unit". Each men- io 
itor unit Is responsible for snooping a configuration dia- 
log targeted at the bus client to whk:h the monitor unit is 
assigned, and possibly configuring an algorithmic entry 
beised on the snooped configuration dialog. For exam- 
ple, assume system CPU 30 engages In a diaksg with is 
network d lent 34 Instructing client 34 to store the next 
incoming packet in a buffer bounded ty memory 
addresses A and B. The monitor unit assigned to net- 
work client 34 snoops this dialog, and configures an 
algorithmkD entry to execute an algorithm (such as a 20 
checlGSum) upon the incoming packet After the packet 
is received. CPU 30 simply retrieves the aigortthm result 
from algorithmic snoop unit 28. 
[0070] As discussed above, It may be desirable for 
starting address register 68 and ending address regis- 2S 
ter 60 to each store three values: an absolute address 
which defines a buffer boundary, an offset or absolute 
address that Indicates the beginning or ending address 
of the range upon which the algorithm should be exe- 
cuted, and a word offset that Indbates the beginning or 30 
ending position within a word. One of the reasons this is 
desirable Is because If an algorithmic entry is config- 
ured by a monitor unit that snoops a configuration dia- 
log, as described above, the values snooped will Include 
the buffer boundary. In contrast, the offset or absolute ^ 
address that Indicates the beginning or ending address 
upon which the algorithm should be executed and the 
word offset that Indk^es the beginning or ending posi- 
tion within a word tend to be based on the protocol. 
[0071] For example, assume that CPU 30 config- 40 
ures n^ork client 34 to store the next incoming packet 
in a buffer bounded by memory addresses A and B. Fur- 
ther assume that all Incoming packets transmit data 
using either the TCP/IP protocol or the NetBEUI proto- 
col. A monitor unit snooping the configuration dialog will 45 
only observe one dtelog. However, the nwnitor unit can 
corrfigure two algorithmic entries, one for each protocol. 
The buffer boundaries will be memory addresses A and 
B for each entry, but the range upon which the algorithm 
will be executed will vary with protocol. so 
[0072] On of the truly wonderful advantages pro- 
vkied by the present invention in general (and the above 
embodiment in particular) is that the present invention 
can calculate many algorithms upon a single data 
stream simultaneously. Accordingly, algorithmic snoop ss 
unit 28 may be configured to execute an algorithm for 
every protocol and data stream fonnal expected to be 
used. For example, after a packet has been rec ived, 
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the packet in the huffier is examined to determine the 
protocol used. Thereafter, CPU 30 retrieves the appro- 
priate value from the algorithmic entry comesponding to 
the protocol. Every algorithm that could possibly be 
used may be executed without incurring a perfomiance 
penalty. 

[0073] TTie present invention provMes a signifteant 
performance advantage compared to prior art methods 
of calculating packet checksums. I=br example, the 

present Invention provides approximately a 30% 
improvement over memory-to-memory hardware-based 
checksumming facinties, and approximately a 15% 
Improvement over fly-by checksumming methods, such 
at that dlsctosed by Brian M. Dowltng et al. 
[0074] Those skilled in the art will recognize other 
applications wherein the present Invention may be 
advantageously applied. For example, the present 
Invention could be used to decode an MPEG video data 
stream as the stream is being transferred Irrto memory. 
In conclusion, the present Invention provides a whole 
new dass of data processing possibilities wherein algo- 
rithms can be executed upon data that is In transit* 
upon a shared bus. 

[0075] Although the present invention has been 
described with reference to preferred embodiments, 
workers skilled in the art will recognize that changes 
may be made In form and detail without departing from 
the scope of the Invention. 

Claims 

1. An algorithmic snoop unit coupled to a shiared bits 
and capable of executing algorithms upon data car- 
ried t>y bus transactions comprising: 

one or more algorithmic entries, with each 
algorithmic entry holding Information that indi- 
cates whether an algorithm should be calcu- 
lated for a particular bus transaction; and 
an algorithmic engine that compares informa- 
tion carried in a Ixjs transaction with the infor- 
mation held in each algorithmic entry to 
determine if the algorithmic entry should be 
active for that transaction, and executes an 
algorithm based on data carried by the bus 
transaction for each active algorithmic entry. 

2. The algorithmic snoop unit of claim 1 wherein each 
algorithmk: entry comprises: 

a result / temporary storage unit; and 

a set of algorithmic errtry control registers. 

3- The algorithmk: snoop unit of claim 2 wherein the 
set of algorithmk: entry control registers includes: 

a client ID register that stores a value identify- 
ing a bus client of th bits transaction; 



10 



19 



EP1 049 292 A2 



20 



a starting address that stores a value indicating 
a starting address of a range of addresses 
upon which an algorithm should be executed; 
and 

an ending address register that stores a value s 
Indicating an ending address of the range of 
addresses upon which an algorithm should be 
executed. 

4. The algornhmic snoop unit of claim 3 wherein the io 
set of algorithmic entry control registers further 
Includes: 

a read or write flag that stores a value that indi- 
cates whether an algorithm should be executed is 
If the bus transaction carries a read operation 
or a write operation. 

5. The algorithmic snoop unit of claim 4 wherein the 
set of algorithmic entry control registers further 20 
Includes: 



10. The algorithmic snoop unit of claim 1 wherein the 
algorithmic engine comprises: 

a bus element separation unit that extracts a 
client ID. an address, and data from a bus 
transaction; 

an algorithmic entry match unit that compare 
information extracted from the bus transaction 
with infonnation stored in each algorithmic 
entry, and determines which algorithmic entries 
are active Ibr a particular bus transaction; end 
an algorithmic calculation unit that executes an 
algorithm for each active entry. 



6. 



a status / control register that stores values 
related to the status and control of the algorith- 
mic snoop unit; and 

an algorithmic ID register that stores a value 
that identifies an algorithm that should be exe- 
cuted upon data contained In the bus transac- 
tion. 

The algorithmic snoop unit of claim 5 wherein the 
set of algorithmic entry control registers further 
Includes: 
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a cryptographic key register that stores a cryp- 35 
tographic key to be used in conjunction with a 
cryptographic algorithm. 

7. The algor'ithmk: snoop unit of claim 3 wherein the 
result / temporary storage unit Includes: 40 

Em accumulator for accumulating results from 
algorithms cateulated upon one or more bus 
transactions. 

45 

8. The algorithmte snoop unit of claim 7 wherein the 
result / temporary storage unit further Includes: 

tempofEuy storage for storing temporary data 
during execution of algorithms; and so 
one or more memory pointers that index a loca- 
tion in memory where results produced tiy exe- 
cution of algorithms may be stored. 

9. The algorithms snoop unit of claim 8 wherein the ss 
algorithmic snoop unit Is capable of snooping its 
own bus transactions. 
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